A Statistical Parser for Czech

نویسندگان

  • Michael Collins
  • Jan Hajic
  • Lance A. Ramshaw
  • Christoph Tillmann
چکیده

This paper considers statistical parsing of Czech, which differs radically from English in at least two respects: (1) it is a highly inflected language, and (2) it has relatively free word order. These differences are likely to pose new problems for techniques that have been developed on English. We describe our experience in building on the parsing model of (Collins 97). Our final results – 80% dependency accuracy – represent good progress towards the 91% accuracy of the parser on English (Wall Street Journal) text.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Parallel Features in Parsing of Machine-Translated Sentences for Correction of Grammatical Errors

In this paper, we present two dependency parser training methods appropriate for parsing outputs of statistical machine translation (SMT), which pose problems to standard parsers due to their frequent ungrammaticality. We adapt the MST parser by exploiting additional features from the source language, and by introducing artificial grammatical errors in the parser training data, so that the trai...

متن کامل

A Concept for a Prosodically and Statistically Driven Chunky Semantic Parser

(Czech Republic), 1998, (pp. 357{362). A Concept for a Prosodically and Statistically Driven Chunky Semantic Parser J urgen Haas, Manuela Boros, Elmar N oth, Volker Warnke, Heinrich Niemann University of Erlangen-N urnberg Chair for Pattern Recognition { Martensstra e 3 { 91058 Erlangen { Germany (haas,boros,noeth,warnke,niemann)@informatik.uni-erlangen.de Abstract. In spoken dialog systems ...

متن کامل

Influence of Parser Choice on Dependency-Based MT

Accuracy of dependency parsers is one of the key factors limiting the quality of dependencybased machine translation. This paper deals with the influence of various dependency parsing approaches (and also different training data size) on the overall performance of an English-to-Czech dependency-based statistical translation system implemented in the Treex framework. We also study the relationsh...

متن کامل

On the Rule-Based Parsing of Czech

This paper presents various attempts to accustom the Rule-based Approach (RBA) as originally introduced by Eric Brill in 1993 (Brill 1993a), on the problem of parsing of Czech (a highly inflective language) within a dependency based syntactic framework (Sgall et al 1986). It is experimentally supported in this paper that the modification of RBA for the stated aim is neither simple nor straightf...

متن کامل

Test Suite for the Czech Parser Synt

This paper presents a set of tools designed for testing the Czech syntax parser that is being developed at the Natural Language Processing Centre at theMasaryk University, synt. Testing the parser against a newly created phrasal tree corpora is very important for future development of the parser and its grammar. The usage of the test suite is not restricted to the synt parser but is open to wid...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999